Tencent Launches First Open Source Multimodal Large Language Model VITA for Seamless Communication with Users
Tencent's YouTu Lab and other institutions have released the first open source multimodal large language model VITA, aimed at bridging the gap in processing Chinese dialects. Based on the Mixtral8×7B model, VITA expands the Chinese vocabulary and undergoes bilingual instruction fine-tuning, mastering both English and Chinese. Key features include: 1. **Multimodal Understanding**: VITA can handle video, images, text, and audio, which is unprecedented among open source models. 2. **Natural Interaction**: No specific wake words are required, allowing for instant response while maintaining polite and non-intrusive communication.